A Linguistic Investigation into Unsupervised DOP

نویسنده

  • Rens Bod
چکیده

Unsupervised Data-Oriented Parsing models (U-DOP) represent a class of structure bootstrapping models that have achieved some of the best unsupervised parsing results in the literature. While U-DOP was originally proposed as an engineering approach to language learning (Bod 2005, 2006a), it turns out that the model has a number of properties that may also be of linguistic and cognitive interest. In this paper we will focus on the original U-DOP model proposed in Bod (2005) which computes the most probable tree from among the shortest derivations of sentences. We will show that this U-DOP model can learn both rule-based and exemplar-based aspects of language, ranging from agreement and movement phenomena to discontiguous contructions, provided that productive units of arbitrary size are allowed. We argue that our results suggest a rapprochement between nativism and empiricism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An All-Subtrees Approach to Unsupervised Parsing

We investigate generalizations of the allsubtrees "DOP" approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator whic...

متن کامل

Automating Construction Work Data-Oriented Parsing and Constructivist Accounts of Language Acquisition

The constructionist approach to language has long proven its merits as a theoretical framework guiding linguistic observations. However, relatively little work has been dedicated to providing a precise, formalized definition of constructions and the mechanisms by means of which they are acquired. In giving an overview of recent work in Data-Oriented Parsing (DOP), we show how the theoretical de...

متن کامل

Exemplar-Based Syntax: How to Get Productivity from Examples

Exemplar-based models of language propose that human language production and understanding operate with a store of concrete linguistic experiences rather than with abstract linguistic rules. While exemplarbased models are well acknowledged in areas like phonology and morphology, common wisdom has it that they are intrinsically flawed for syntax where infinite generative capacity is needed. This...

متن کامل

Linguistic Constraints in Lfg-dop

LFG-DOP (Bod and Kaplan, 1998, 2003) provides an appealing answer to the question of how probabilistic methods can be incorporated into linguistic theory. However, despite its attractions, the standard model of LFG-DOP suffers from serious problems of overgeneration because (a) it is unable to define fragments of the right level of generality, and (b) it has no way of capturing the effect of an...

متن کامل

Unsupervised Parsing with U-DOP

We propose a generalization of the supervised DOP model to unsupervised learning. This new model, which we call U-DOP, initially assigns all possible unlabeled binary trees to a set of sentences and next uses all subtrees from (a large subset of) these binary trees to compute the most probable parse trees. We show how U-DOP can be implemented by a PCFG-reduction technique and report competitive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007